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ON GALACTIC DENSITY MODELING IN THE PRESENCE OE DUST EXTINCTION 
Jo Hans-Walter Rix^, Gregory M. Green^, Edward F. Schlaely^, and Douglas P. Finkbeiner^ 

ABSTRACT 

Inferences about the spatial density or phase-space structure of stellar populations in the Milky 
Way require a precise determination of the effective survey volume. The volume observed by surveys 
such as Gaia or near-infrared spectroscopic surveys, which have good coverage of the Galactic mid¬ 
plane region, is highly complex because of the abundant small-scale structure in the three-dimensional 
interstellar dust extinction. We introduce a novel framework for analyzing the importance of small- 
scale structure in the extinction. This formalism demonstrates that the spatially-complex effect of 
extinction on the selection function of a pencil-beam or contiguous sky survey is equivalent to a low- 
pass filtering of the extinction-affected selection function with the smooth density field. We find that 
the angular resolution of current 3D extinction maps is sufficient for analyzing Gaia sub-samples of 
millions of stars. However, the current distance resolution is inadequate and needs to be improved by 
an order of magnitude, especially in the inner Galaxy. We also present a practical and efficient method 
for properly taking the effect of extinction into account in analyses of Galactic structure through an 
effective selection function. We illustrate its use with the selection function of red-clump stars in 
APOGEE using and comparing a variety of current 3D extinction maps. 

Subject headings: dust, extinction — Galaxy: kinematics and dynamics — Galaxy: structure — 
methods: data analysis — stars: statistics — surveys 


I. INTRODUCTION 

A common problem in the analysis of stellar surveys in 
the Milky Way is to account for the effects of astrophys- 
ical and procedural selection effects on the manner in 
which the Galactic volume is surveyed (e.g., Rix & Bovy 
2013). The latter effects primarily consist of the fraction 
of stars observed as a function of their position on the 
celestial sphere, stellar type, color, or magnitude. Exam¬ 
ples of the astrophysical effects include how interstellar 
extinction impacts the observed magnitudes of stars or 
the manner in which different stellar types trace the un¬ 
derlying stellar populations. Accounting for these selec¬ 
tion biases is essential for using stellar surveys to deter¬ 
mine the spatial or phase-space distribution of stars in 
our Galaxy. 

The basic formalism for fitting models of the den¬ 
sity or phase-space distribution has been reviewed by 
Rix & Bovy (2013). The formalism presented there is 
general, but in this paper we focus on understand¬ 
ing the formalism and its implementation in the case 
that the effect of interstellar extinction on the selec¬ 
tion function of a stellar survey is significant. To mo¬ 
tivate this, Eigure 1 displays the amount of interstellar 
extinction to 5 kpc over the whole sky from a combina¬ 
tion of the Marshall et al. (2006), Green et al. (2015), 
and Drimmel et al. (2003) extinction maps (see the Ap¬ 
pendix). It is clear from this map that any investigation 
of the phase-space distribution of the stellar disk near 
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the midplane is strongly affected by the highly filamen¬ 
tary structure of the extinction. Determining simple cuts 
to define a volume-complete sample of stars in this region 
of the Galaxy is difficult and sub-optimal. 

Below, we present a simple and efficient formalism for 
determining and understanding the effect of extinction 
on the effective volume of a survey. We do this for two 
cases: (a) a large-area survey such as Gaia that covers a 
significant fraction of the sky and (b) a pencil-beam sur¬ 
vey, such as the spectroscopic APOGEE survey (S. R. M. 
Majewski, et ah, 2015, in preparation), consisting of a 
number of small-area field pointings. While the formal¬ 
ism is different in practice, in both cases an important 
conclusion is that the effect of extinction on small spatial 
scales is unimportant when considering smooth models 
for the density or phase-space distribution of stars. 

The outline of this paper is as follows. § 2 reviews 
the basic likelihood-based formalism for inferring stellar 
phase-space distributions. We present the formalism for 
determining the effective survey volume necessary in the 
likelihood for large-area surveys and pencil-beam surveys 
in §§ 3.1 and 4.1, respectively. We give a detailed exam¬ 
ple of a Gaza-like large-area survey of red-clump (RC) 
stars in § 3.2. § 4.2 discusses the effect of extinction on 
the pencil-beam APOGEE volume selection of RC stars 
as an example of the formalism in § 4.1. We discuss and 
conclude in § 5. 

2. LIKELIHOOD-BASED DENSITY MODELING AND THE 
EFFEGTIVE SURVEY VOLUME 

As discussed in detail by Bovy et al. (2012) and 
Rix & Bovy (2013), the correct model of star counts is 
an inhomogeneous Poisson process. This process is char¬ 
acterized by a rate function A(0|6>) that specifies the 
model prediction for the number of stars as a function 
of O = (/, 5, D, m, c, [Fe/H]); the model is parameter¬ 
ized by parameters 0. Here we have written the observ¬ 
ables in a generic form consisting of: (a) the position 
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Fig. 1.— G-band extinction to 5kpc from the Sun, saturated at the approximate maximum Aq for which we can see the RC at G < 20 
(that is, in Gaia). For the APOGEE if-band observations, this extinction map saturates at Ah ~ 1.1, which is the maximum extinction 
to which the RC can be seen to 4.5 kpc in medium-deep APOGEE observations. The boundaries of the three extinction maps that are 
combined to obtain full-sky coverage (see the Appendix) are indicated. Much of the Galactic plane region is too extinguished for Gaia or 
APOGEE to be able to see the RC at 5 kpc and larger distances. 


in Galactic longitude, latitude, and heliocentric distance 
(/,6, H), (b) an apparent magnitude m, and (c) a color 
c and metallicity [Fe/H]. It is necessary to include the 
distribution of color (or multiple colors) and potentially 
metallicity when the absolute magnitude M of the stel¬ 
lar tracer being considered depends on these. We will see 
below that the formalism is especially simple in the case 
of a standard candle, that is, a stellar tracer for which M 
is constant; however it is straightforward to include the 
full dependence of M on color(s) and metallicity, or any 
other observable that affects the absolute magnitude. 

The rate function A(0|6>) is given by 

A( 0 |^) = z/*(x,y, z\e) X Ij(x,y,Z;/,6, d)\ 

X p(M, c, [Fe/H] |X, y, Z) x S{1, b,m), ^ ^ 

where z^*('|6>) is the spatial density in Galactocen- 
tric rectangular coordinates (X, y, Z) that we are ulti¬ 
mately most interested in and that depends on param¬ 
eters I J(X, y, Z;/, 6, D)| = cos 6 is the Jacobian 
of the transformation between (X^Y^Z) and (/,6,1)), 
p(M, c, [Fe/H] |X, y, Z) is the density of stars in absolute- 
magnitude-color-metallicity space (normalized to inte¬ 
grate to one), and 5'(/, 6, m) is the survey selection func¬ 
tion (the fraction of stars from the underlying population 
of potential targets observed by the survey). For simplic¬ 
ity we have assumed that the survey selection function 
does not depend on color, metallicity, or any other ob¬ 


servable, but including these is straightforward. 

As in Bovy et al. (2012), the rate has an additional 
amplitude parameter that gives the number density of 
stars at a reference location. The full log likelihood for 
the Poisson process is 

ln£(6») = ^lnA(Oi|6») - f dOA(0|6»). (2) 


The integral in this equation is the expected number of 
stars given the model parameters that provides the nor¬ 
malization of the rate likelihood. It does not depend on 
the individual data point, but is instead a property of 
the whole survey for a given model specified by 0. In 
what follows we refer to this integral as the effective vol¬ 
ume Heff of the survey^ because it is proportional (up to 
a constant, the inverse local density) to the traditional 
definition of the effective volume. 

In many cases, the overall amplitude parameter is un¬ 
interesting. To remove the amplitude parameter from 
further consideration, we can marginalize the probabil¬ 
ity of the parameters of the rate function over the ampli¬ 
tude of the rate. Using a prior on the amplitude that is a 
power-law with exponent a, the marginalized likelihood 
is given by 


ln£(6») = ^In 


~ KOi\oy 


(1 + a) In Feff . (3) 
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For a prior that is inversely proportional to the amplitude 
{a = —1), the second term in this equation is zero. The 
likelihood can be simplified to 


\nC{0) = In 

i 


MXi,yuz^\o) 


- (l + a) InVeE . (4) 


when the rate \{Oi\0) only depends on 0 through 
which is often a good assumption. Then u^{Xi^Yi^ Zi\0) 
is the only factor in \{Oi\0) that depends on 0] the other 
factors can be dropped. 

It is the effective volume that is difficult to compute in 
the presence of filamentary extinction and it is the focus 
of the remainder of this paper. 


3. THE EFFECTIVE SURVEY VOLUME FOR LARGE-AREA 

SURVEYS 

3.1. Formalism 

In this section, we work out the formalism for calcu¬ 
lating the effective survey volume in the presence of dust 
for a contiguous survey of a large part of the sky. This 
is useful for analyzing the spatial structure of the Milky 
Way from photometric surveys or for analyzing the full 
phase-space structure from Gaia data. We assume that 
the conversion between distance and absolute magnitude 
can be written as a function of an unreddened color and 
metallicity, although for a photometric survey the latter 
might be replaced by a second color or dropped entirely. 
We also assume that the selection function is only a func¬ 
tion of magnitude—we denote this magnitude in this sec¬ 
tion by G with Gaia in mind—although the generaliza- 

I 


tion to include color dependence (whether corrected for 
extinction or not) is straightforward. For contiguous sur¬ 
veys the selection function typically does not explicitly 
depend on sky position and we write it as 5'(G[-]), al¬ 
though it could be easily incorporated in what follows^; 
we use the argument G[-] to indicate the dependence of 
G on (/, 6, T), c, [Fe/H]), but we do not always write these 
variables explicitly to keep the expressions cleaner. We 
first write out the integration over O in the effective vol¬ 
ume in equation (4) explicitly 

u,{x,Y,z\e) 

X 11 dcd[Fe/R]p{c,[Fe/R])S{G[-]) , 

(5) 

where we have used that | J(X, T, Z; / , 6, T))| = cos 6. 
The integral over (c, [Fe/H]) is the integration over the 
absolute-magnitude distribution and it can be efficiently 
computed using Monte Carlo integration (see below). We 
can then write the density z^*(X, Y^ Z\0y as z^*(/, h^D\0) 
and write its dependence on (/,6) using an orthonor¬ 
mal basis expansion, say in terms of spherical harmonics 
Yp{^l2-h,l) 

oo I 

v.{l,b,D\e) = Y, Y ’^*,em{D\0)Yr^{7r/2-b,l). (6) 

£=0 m =-£ 

Then the effective volume becomes 


J d0A(0|6») = /// dl db dD cos b 


oo £ 


f dO\{O\0) = Y Y [<iDv^,em{D\e) X //dcd[Fe/H] p(c, [Fe/H]) x f f dldb cosbYr{T^/2 - b,l) S{G[-]), 

£=Q m=-£’^ ^ ^ ^ ^ 

(7) 


(where all of the integrations are nested, not indepen¬ 
dent) and we can introduce the effective selection func¬ 
tion ^£m of order (^, m) 


nm{D) = jj dcd[Fe/H] p(c, [Fe/H]) 

/y'dM6 coshVr{TT/2-h,l)S{G[-]). 


(general case; 8) 


If p(c, [Fe/H]) depends on (l^b^D)^ then that depen¬ 
dence can be incorporated by moving the integration over 
p(c, [Fe/H] |/, 6, T)) into the integral over (/, 6). For a stan- 


® The boundaries of a partial-sky, contiguous survey can be in¬ 
corporated by restricting the integration area in (/, h) below to that 
of the survey, although sharp survey boundaries should be apodized 
to avoid ringing in the basis-function expansion below 

^ When analyzing the full six-dimensional phase-space distri¬ 
bution this density is replaced by the integration of the 6D DF 
over velocity, but the formalism given here still applies when the 
selection function is independent of kinematics (which it typically 
is). 


dard candle, equation (8) simplifies to 

&e^{D) = JJ cosbYp{TT/2-b,l)S{G[l,b,D]). 

(standard candle; 9) 

The effective volume in terms of becomes 

/ rt OO 

d0A(016») = dDD^ Y Y ^*,£m {D\0) ^£jn{D) . 

£=0 m =-£ 

( 10 ) 

Equation (10) makes it clear that the effective survey 
volume is essentially the cross-correlation of the density 
with the effective selection function. Because the Galac¬ 
tic stellar phase-space density is much smoother than the 
extinction map, this cross correlation is essentially a low- 
pass filtering of the extinction-affected selection function 
with the density that limits the influence of the small- 
scale power in the interstellar dust distribution. 

We discuss the precision to which the effective survey 
volume can be computed as a function of the angular and 
distance resolution in the 3D extinction map below. The 
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Fig. 2. — Distribution of the absolute G-band magnitude derived 
from PARSEC isochrones for an RC sample selected using the cuts 
from Bovy et al. (2014) and assuming a solar-neighborhood metal- 
licity distribution. The distribution of Mq peaks at Mg = 0.68 
with an approximate width of 0.08 mag. 

required angular resolution depends on the sample size, 
the complexity of the model, the sky coverage, etc. and 
it can be determined from an analysis of the cross corre¬ 
lation in equation (10) as in the example below. Once a 
good angular resolution is determined one will in prac¬ 
tice prefer to work with the effective selection function in 
position space. For this we define the effective selection 
function 0(/, 6, D) 

&{l,b,D) = 11 dcd[Fe/H] p{c, [Fe/H]) S{G[-]), 

(general case; 11) 

which can be efficiently computed using Monte-Carlo in¬ 
tegration. For a standard candle we simply have that 
0(/,6,1)) = 5'(G[']). In both cases the (/,6) dependence 
of 0(/,6,1)) is through the (/,6) dependence of the ex¬ 
tinction that gives G = Mq + /i + Ag{1, where Mq 

is the absolute magnitude and fi is the distance modulus. 
The effective volume is then simply 

J dOA(0|6») = 

d; dbdDD^ cosbi^4X,Y,Z\e)&{l,b,D). 

( 12 ) 

Thus, in practice we do not need to work in Fourier 
space, but the Fourier space analysis is helpful to eluci¬ 
date the importance of small-scale power in the extinc¬ 
tion map and to determine an appropriate resolution to 
discretize the integration in equation (12). 

3.2. Gaia example 

In this section we work through an example of the for¬ 
malism described above to determine the impact of in¬ 
terstellar extinction on phase-space inferences from Gaia. 
We assume that the stellar tracers for which we are fitting 
the density distribution are RC stars, which are close to 
the standard candle case. For the purpose of this exam¬ 
ple, we estimate the absolute magnitude distribution of 
RC stars in the Gaia G band as follows. We use PARSEC 


isochrones (Bressan et al. 2012) and the transformation 
between {g^g — z) and G from Jordi et al. (2010) to de¬ 
termine the mean absolute magnitude of stars of a given 
age and overall metallicity [Fe/H] that satisfy the RC se¬ 
lection criteria of Bovy et al. (2014). We find that the 
spread at a given age and metallicity is small and fur¬ 
thermore that the age dependence is small. Therefore, 
we determine MG'([Fe/H]) as the mean Mq at an age of 
5 Gyr (thus effectively ignoring the color dependence in 
the integrals above). We use the local metallicity dis¬ 
tribution (Hayden et al. 2015) to determine the distri¬ 
bution of Mq from this. The resulting distribution of 
Mg is displayed in Figure 2. The distribution peaks at 
Mg = 0.68 and is narrow with an approximate width of 
0.08 mag. We stress that this is a simplified treatment 
of the absolute magnitude distribution, which will in de¬ 
tail depend on the position in the Galaxy through its 
dependence on age and metallicity. This dependence can 
be taken into account using the formalism above. We 
use 10,000 Monte Garlo samples from the distribution in 
Figure 2 to approximate the integration over the intrinsic 
absolute magnitude distribution above. 

We consider a Gaza-like survey of RG stars, with a se¬ 
lection function that is constant over the full sky between 
G = 3 and G = 20 and compute the effective volume for 
a density model that consists of a radial and vertical ex¬ 
ponential disk with a scale length of 3kpc and a scale 
height of 300 pc. The ingredients of the effective area— 
the angular part of the integration in equation (10)— 
at two different distances are shown in Figure 3. The 
top panels display the angular power spectra of all of 
the ingredients: the density distribution, the extinction 
distribution, and the effective selection function of equa¬ 
tion (8)^. While most of the power in the extinction 
map is on large scales, the power on smaller scales only 
slowly diminishes. The red line that shows the angular 
power spectrum of the effective selection function demon¬ 
strates that most of the power in the extinction map is 
transferred to 0. However, the density of a typical expo¬ 
nential disk has a much stronger decline in power toward 
small scales. The middle panels display the cross power 
spectrum between the extinction map and the density as 
well as that between 0 and the density. As expected, on 
large scales where the density power spectrum is close to 
constant, the cross power spectrum follows that of the 
extinction or B, but on small scales the power is signifi¬ 
cantly reduced due to the lack of small-scale power in the 
density. The bottom panels of Figure 3 show the impact 
of the resolution of the extinction map when computing 
the effective survey volume. These panels display the er- 

® The Cl and computed using HEALPix (Gorski et al. 

2005) using the standard definition 

' m 

and 

z^^cross _ ^ \ ^ _ L* 

~ 2 ^ + 1 ^ ’ 

' m 

where ai^ and bi^ are the coefficients of the decomposition into 
spherical harmonics of two maps. We do this by evaluating the 
maps at the highest resolution of the Green et al. (2015) map 
inside = 2048). The pixelation affects the power spectra at the 
largest ^ > 2 x 10^; we do not attempt to correct for this here. 
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Fig. 3.— Top panels: Power spectrum of an exponential disk (blue; Hr = 3kpc, hz = 0.3kpc), the combined extinction map (see the 
Appendix) (black), and the selection function of a Gam-like RC sample (3 < G < 20; red) at 5 kpc (left panels) and 6.3kpc (right panels). 
Middle panels: Cross power-spectra of the density with the extinction map (black) and the selection function (red). Bottom panels: error 
in calculating the natural logarithm of the effective area when only including terms up to spherical degree i in the calculation: the effective 
area is the cross-correlation of the density and the selection function. The pixelization of the extinction maps affects the power spectra at 
i > 2 X 10^. The density acts as a low-pass filter on the selection function, whose integrated power would otherwise only slowly converge 
when going to smaller scales. 


ror in computing the effective survey volume when only 
including terms up to spherical degree i in the calcula¬ 
tion, that is, when limiting the resolution to ~ 180°/-^. 

When performing a likelihood-based inference we need 
to be able to compute the relative log likelihood of mod¬ 
els within — 2Aln£ = C^(l) of each other to 

a precision that is much smaller than 1. This is such 
that errors in the likelihood computation do not affect 
the uncertainty assigned to model parameters (which 
are roughly assigned based on Ay^ « Because 

the effective volume normalizes the probability for each 
data point (see equation [4]), this requires us to compute 
the logarithm of the effective volume of models within 
Ay^ ^ 0{1) to a precision much better than 1/A for 
a sample of N data points. The bottom panels of Fig¬ 
ure 3 display the absolute precision in the effective vol¬ 


ume (that is, the precision of a single effective-volume 
calculation); if this is smaller than 1/A, then the rela¬ 
tive precision—the precision of the logarithmic difference 
between the effective volumes of two similar models—will 
also be computed precisely enough. For a Gam-sized sur¬ 
vey of RC stars, A ^ 10^, which requires an extinction 
map at a resolution of < 10' (^ ~ 1,000). This is close to 
the resolution in 3D extinction maps that exists in cur¬ 
rent maps (see the discussion in the Appendix). Extrap¬ 
olating the behavior at ^ < 2 x 10^—where it is not af¬ 
fected by the pixelization of current extinction maps—in 
the bottom panels of Figure 3, we find that the full Gaia 
data set (A ^ 10^) requires a resolution < 2', which is at 
the limit of current extinction maps. Modest improve¬ 
ments in the currently available 3D extinction maps will 
therefore suffice for computing the effective area for any 
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Fig. 4. — Power spectrum of the effective area as a function of 
distance modulus, computed for an exponential disk {hji — 3kpc, 
hz = 0.3 kpc) and the selection function of a Gam-like RC sample 
(3 < G < 20). This power spectrum can be interpreted as the error 
made in the computation of the effective volume as a function of the 
limited distance sampling of 3D extinction maps. The black curve 
displays the full-sky effective area and the red curve excludes a 
rectangular inner 25° x 25° box. The error in the effective volume 
due to the poor distance sampling of current extinction maps is 
^ 10~^ over the full sky, but dominated by the highly-extinguished 
inner Galaxy; excluding the central region, currently the error is 
10“^. Extrapolating the trends with resolution (dashed lines), an 
order of magnitude improvement in the distance resolution should 
lead to errors <10“® over the full sky and < 10“^® excluding the 
central region, which is good enough for large-scale Gaia analyses. 

smooth model for the phase-space distribution of stars 
constrained using Gaia data. Analyses using samples 
with small sizes do not gain from using a high-angular- 
resolution effective selection function; an analysis such as 
that presented in the bottom panels of Figure 3 can be 
used to determine a good resolution to degrade the effec¬ 
tive selection function to, to minimize the computational 
cost for computing the effective selection function. 

Computing the effective volume requires us to inte¬ 
grate the effective area over distance (see equation [10]). 
While the angular resolution of 3D extinction maps is 
high, the distance resolution of these maps is poor due 
to, for example, uncertainties in the distances to the stel¬ 
lar tracers used to determine the extinction. The spatial 
sampling in distance modulus of the Green et al. (2015) 
and Marshall et al. (2006) maps is ^ 0.5 mag, although 
we note that this is under sampled by a factor of ~ 2, in 
that a finer distance sampling is possible for the same 
data. This resolution is much worse than the angular 
resolution of these maps (^ 10'-15'; see the Appendix), 
which corresponds to a relative distance resolution of less 
than 1 %. However, because extinction along a line of 
sight is a cumulative quantity, the behavior of extinction 
with distance is typically smoother than the angular de¬ 
pendence. The impact of the distance resolution on the 
error in the effective volume should then be less severe. 

To assess the error in the effective volume due to dis¬ 


tance resolution, we have computed the effective area 
for the same exponential disk model and Gaza-like RC 
selection as above at all of the distance moduli of the 
combined extinction map of the Appendix. Figure 4 
shows the power spectrum—calculated using a standard 
periodogram estimate—of the effective area as a func¬ 
tion of distance modulus (this includes an extra factor 
of distance because of the transformation between dis¬ 
tance and distance modulus). The power spectrum at 
a given resolution is the approximate error due to the 
ignorance of sub-resolution fluctuations in the area; this 
approximation works well because the power spectrum 
steeply declines with increasing k. An estimate of the 
error in computing the effective volume using Simpson’s 
rule agrees with the power spectrum estimate. The black 
curve in Figure 4 displays the power spectrum using the 
full-sky effective area. This leads to an error on the 
0.5 mag spatial sampling of ~ 10“^. An investigation 
of the angular dependence of the integrand reveals that 
much of this error comes from the inner region of the 
MW, where both the extinction and the density of an 
exponential disk are high and small-scale structure in the 
extinction has a significant impact on the effective vol¬ 
ume. Excluding the inner regions (the red curve in Fig¬ 
ure 4) removes much of the error and leads to an error of 

10“^. The error also decreases about twice as fast with 
increasing resolution outside of the inner regions. In¬ 
creasing the distance sampling above that of Green et al. 
(2015) by a factor of ten, which can be achieved within 
a few years using the precise Gaia parallaxes for many 
stellar-extinction tracers, leads to errors of ^ 10“^ (full 
sky) and ^ 10“^^ (excluding the central regions). The 
latter is good enough to allow sensitive measurements of 
the disk’s structure and asymmetries using the full Gaia 
catalog, but studying the inner regions with large Gaia 
samples will be challenging. 

The discussion so far has focused on angular and dis¬ 
tance precision. However, accuracy is as important for 
disentangling intrinsic structural properties of the Milky 
Way’s stellar distribution from those of the dust distri¬ 
bution. In Figure 5 below, we compare the distribution 
of near-infrared (NIR) extinction in a few lines of sight 
for four existing 3D extinction maps. While overall the 
agreement is good, differences between different maps 
are much larger than the precision of each of the maps. 
In particular, extinction maps constructed largely using 
optical data (Sale et al. 2014 and Green et al. 2015) dis¬ 
agree with IR-based maps in regions of high extinction. 
Because the high-extinction regime is the most impor¬ 
tant for the error in the effective volume, future extinc¬ 
tion maps will need to incorporate IR data to properly 
sample this regime. 

4. THE EFFECTIVE SURVEY VOLUME FOR 
PENCIL-BEAM SURVEYS 

4.1. Formalism 

In principle the same formalism as presented above 
could be applied to a pencil-beam survey. However, 
the sharp edges of each celestial pointing in pencil-beam 
surveys would introduce significant ringing in the effec¬ 
tive selection function at each order, while apodizing the 
boundary of each field would require much data to be 
discarded. Pencil-beam surveys are closer to the case of 
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Fig. 5.— Distribution of extinction Ah as a function of distance for four different three-dimensional extinction maps and eight represen¬ 
tative APOGEE pointings. The solid line for each map displays the mean extinction over the area of an APOGEE pointing (which have 
radii of 1.49°, except for the (I, b) = (12.0, 2.0) pointing that has a radius of 1°) and hatched regions demonstrate the 1 sigma range of the 
distribution. The dots are data points from the APOGEE-RG catalog. Overall the different extinction maps agree well with each other and 
with the data (the large discrepancies between the Sale et al. (2014)/Green et al. (2015) and Drimmel et al. (2003)/Marshall et al. (2006) 
maps are beyond where the former maps are considered reliable). 










































a set of delta-function pointings on the sky, where we 
would compute the effective volume by summing over 
these pointings. 

As an example of a pencil-beam survey we will con¬ 
sider the selection function of RC stars in the APOGEE 
survey. Eor this survey the survey selection function is 
accurately known (Bovy et al. 2014) and it is a function 
of the field and apparent H magnitude (not corrected for 
extinction). That is, S = 5'(field,i7). The RC is close 
to a standard candle in the NIR, with variations as a 


function of color and metallicity of ~ 0.1 mag. Never¬ 
theless, we will continue to present the formalism for the 
general case where there are variations of the absolute 
magnitude with color c (typically c = {J — Kg)o for the 
NIR RC) and [Ee/H]. With APOGEE in mind, we use H 
instead of the generic magnitude m in this section and 
we again replace the arguments {l^b^D^c, [Ee/H]) of H 
by [•] in many expressions to avoid notational clutter. 

The effective volume for a pencil-beam survey can be 
written as a sum over field locations 


J d0A(0|6»)= JdDD‘^i^4[X,Y,Z]{D,&eld)\e) x JJ dcd[Fe/H] p(c, [Fe/H]) x JJ dldb cosbS{&eld,H[-]), 

fields 

(13) 


where we have assumed that the density u^{l,b^D) is 
sufficiently smooth that it can be considered constant 
over the extent of the field. When this is not the case, 
z/*(/, 6, D) could be expanded in terms of basis functions 
around the center of each field in much the same way 
as in equation (6), which would introduce higher-order 
effective selection functions below, in the same manner 
as in equation (8). We further assume that the absolute- 
magnitude spread of the stellar tracer does not have sig¬ 
nificant variations within a field; field-to-field variations 
could be included, but are also neglected here. 

Erom the expression in equations (13), it is clear that 
the three-dimensional distribution of extinction only en¬ 
ters in the integration over (/,6). We then again intro¬ 
duce the effective selection function B. Eor a standard 
candle, B is the selection function averaged over (/, b) 

©(field,D) = jj cosbSifLeld,H[l,b,D]), 

(standard candle; 14) 


where Cl is the area of the (/, 6) integration. If the tracer 
stellar population is not a standard candle, we absorb 
the integration over (c, [Ee/H]) into the definition of the 

I 

©(field, D) = Y, 5'(field, k) " 

k 


effective selection function 


©(field, D)= Jj dcd[Fe/H] p{c, [Fe/H]) 

X JJ cosbS{de\d,H[l,b,D,c, [Fe/H]]); 


(general case; 15) 


This integration over (c, [Ee/H]) can again be easily per¬ 
formed using Monte Carlo integration. 

With this definition of the effective selection function, 
the effective volume becomes simply 

J do\{o\e) = 

f<iDD^u4[X,Y,Z]{D,de\d)\9)&{&eld,D). 

fields 

(16) 


4.2. The effective selection function of the 
APOGEE-RC sample 

The APOGEE selection function is a constant 
5'(field,/c) within each bin /c of a small number of 
magnitude bins [i^min,/c: ^max,/c] (Zasowski et al. 2013; 
Bovy et al. 2014). Therefore, we can simplify B to 


Ho{D) < Anjl^b.D) < - Ho{D)) 


(17) 


for a standard candle with Hq = Mh + /i, where 
Mh is the absolute magnitude of the standard can¬ 
dle and /i is the distance modulus; fl(i^min,/c — Hq < 
Ah{D) < — Hq) is the area of the field with 

Ah between the given boundaries and Clf is the total 
area of the field (not all APOGEE fields have the same 
area). Eor a non-standard candle we additionally inte¬ 
grate over (c, [Ee/H]), which in the APOGEE-RC case 
are ([J — i^s]o, [Ee/H]). In this expression, we have sup¬ 
pressed the dependence of Ah on (/,5) for clarity. The 
APOGEE selection function S and effective selection 


function B for a standard candle or any tracer with a 
known Mh distribution is implemented as part of the 
apogee Python package at 

http://github.com/j obovy/apogee 
as part of its apogee . select. apogeeSelect module. 

Thus, at any given distance the effective selection func¬ 
tion only depends on a smoothed version of the two- 
dimensional extinction distribution at that distance: the 
fractional area of the field in wide (~ 1 mag) bins in 
Ah{D). Eor the RC sample that we use here the depen¬ 
dence of Hq on ([J — Ks]o, [Ee/H]) is weak (< 0.1 mag; 
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see Figure 3 of Bovy et al. 2014), such that we can treat 
the RC as a standard candle. The weak dependence of 
the RC absolute magnitude implies that any density or 
phase-space fits are only very weakly dependent on the 
color-metallicity distribution of stars in the sample. 

To illustrate the pattern of extinction that affects 
APOGEE observations, we can interpret the extinction 
displayed in Eigure 1 as the il-band extinction between 
0 and 1.1 {AgIAh ~ 5.1). This upper limit corresponds 
to the maximum extinction for which we can see the 
RC to ~ 4.5 kpc in a medium-deep APOGEE pointing 
(H < 12.8). This gives a sense of the volume over which 
the RC can be mapped with APOGEE. RC stars can 
be seen to 4 kpc over almost the whole sky, while at 
5 kpc and further much of the disk region is obscured. 
APOGEE has deep pointings (down to H < 13.8) along 
a few lines-of-sight that allow for some of this highly- 
extinguished structure to be penetrated. 

In Eigure 5 we demonstrate the range of extinction val¬ 
ues in eight APOGEE pointings for four different three- 
dimensional extinction maps. The default extinction 
model that we use is that of Green et al. (2015), which 
is displayed in blue. Each curve shows the mean ex¬ 
tinction within the APOGEE pointing and the hatched 
regions display the la range of the extinction distribu¬ 
tion. Other three-dimensional extinction maps are com¬ 
pared to our default model: the yellow curve displays 
the extinction model of Drimmel et al. (2003), which is 
a fully-sky map with an angular resolution of ^ 20'; the 
red curve shows the 3D map of Marshall et al. (2006) 
which is available over —100° < I < 100° and |6| < 10°; 
and the cyan curve gives the map of Sale et al. 2014, 
available over 30° < I < 215° and \b\ < 5°. The agree¬ 
ment between the recent extinction maps of Sale et al. 
(2014) and Green et al. (2015) is remarkably good, but 
they both typically give smaller extinction close to the 
Galactic plane than the Marshall et al. (2006) map. The 
APOGEE-RC data themselves provide good estimates of 
extinction values at the range of distances included in our 
sample and our data are displayed as black dots. Overall 
the agreement between the extinction maps and our data 
is good; however, our extinction vs. distance data do 
not allow us to unambiguously prefer the Marshall et al. 
(2006) map over the Green et al. (2015) map at low 
Galactic latitudes. 

While the differences between the different 3D extinc¬ 
tion maps can be quite substantial, especially at large 
distances, they only matter for any density inference in 
as much as they lead to different effective selection func¬ 
tions 0>(field, D). Eigure 6 demonstrates the effective 
selection function for the same eight pointings as dis¬ 
played in Eigure 5. At Galactic latitudes above a few de¬ 
grees, the different maps lead to similar effective selection 
functions, even though the extinction can be substantial 
{Ah ~ 1 near the faint end of the survey). We have 
included one APOGEE pointing in the inner Galaxy at 
(/,6) = (12,2): these locations were only observed down 
to H < 11 and therefore the effective selection function 
only reaches to ~ 3 kpc; the different extinction maps 
all return the same 0 for such inner-Galaxy pointings. 
Within a few degrees of the mid-plane, larger differences 
occur between the effective selection function calculated 
using the Marshall et al. (2006) map and the Green et al. 
(2015)/Sale et al. (2014) maps, especially at I < 60°. 


All but one of the curves in Eigure 6 are computed as¬ 
suming that the RC is a standard candle. The dashed 
green line demonstrates what happens when we relax this 
assumption for the Green et al. (2015) map. Eor all but 
the least obscured pointings, the small ~ 0.1 mag width 
of the RC absolute magnitude has no effect on the effec¬ 
tive selection function, essentially because the small-scale 
structure in the extinction map washes out this small 
spread. Only at high Galactic latitude is the extinction 
small enough that the small Mh spread in the RC can 
affect the effective selection function. 

We estimate the error in the calculation of the effective 
volume due to the distance integration in the same man¬ 
ner as for the Gaia example above for the eight fields 
of Eigures 5 and 6. We find that the current error is 
^ few X 10“^, decreasing by ^ 10“^ per decade of im¬ 
proved distance resolution. These errors are larger than 
for the Gaia example above, mainly because of the much 
narrower magnitude bins employed by APOGEE and due 
to apogee’s focus on the mid-plane, where the ex¬ 
tinction has more small-scale structure. Analyses using 
hundreds of tracers are therefore unaffected by the cur¬ 
rent distance distance sampling. Using larger samples 
may lead to systematic uncertainties due to the limited 
distance sampling in current extinction maps, but again 
the relevant error is the relative error between models 
that fit the data similarly well (Ay^ = 0{1)) and this 
is likely much smaller than the absolute error discussed 
here. Mock-data tests for an APOGEE-like sample of 
20,000 stars by Bovy et al. (2016) indicate that such sam¬ 
ples are also currently unaffected by the distance resolu¬ 
tion. As discussed above, improvements in the distance 
resolution by about a factor of 10 are likely in the near 
future, such that spectroscopic samples close to the mid¬ 
plane of ^ 10^ stars should be able to be analyzed; for 
spectroscopic surveys that avoid the mid-plane (i.e., op¬ 
tical surveys) the requirements are less severe. 

5. CONCLUSIONS 

Motivated by the advent of near-infrared spectroscopic 
surveys and Gam, we have discussed likelihood-based 
phase-space inference in the MW in detail in this pa¬ 
per, focusing in particular on the effects of interstellar 
extinction. Central to this inference is the effective sur¬ 
vey volume, which acts as a normalization factor in the 
likelihood. In principle the computation of this effec¬ 
tive volume is straightforward: one simply integrates the 
density at (/,5, D) multiplied by the selection function, 
which includes the effects of extinction, over the full MW 
volume. In practice, this integration is expensive and the 
uncertainties due the angular and distance resolution in 
the extinction map have been difficult to assess. 

In § 3, we have introduced a novel formalism that 
introduces the effective selection function to assess the 
impact of angular and distance resolution on the com¬ 
putation of the effective volume for a large-area survey 
such as Gaia. We have found that the angular resolu¬ 
tion of current extinction maps is sufficient to analyze 
Gaia samples consisting of millions of stars in terms of 
smooth, axisymmetric density distribution. Only mod¬ 
est and realistic improvements in the angular resolution 
are necessary to analyze the full Gaia data set (« 10^ 
stars; in case one is so inclined). More limiting than the 
angular resolution is the distance sampling of current 
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Fig. 6.— Effective selection function 0>(location, D) for eight APOGEE pointings. The dashed line displays the raw selection function, 
which is a piecewise-constant function over one, two, or three magnitude ranges, translated from H to distance assuming Mh = —1.49 
for the RC. The blue solid line shows the effective selection function calculated using the Green et al. (2015) 3D extinction map assuming 
that the RG is a standard candle, while the dot-dashed green line takes into account the spread in Mh in the RG for the whole APOGEE- 
RG sample. The uncertainty in this effective selection function as calculated from the Green et al. (2015) MGMG samples is smaller 
than the width of the line. The effective selection function for three other extinction maps is also shown where these maps are available 
(Drimmel et al. 2003: full sky; Marshall et al. 2006; —100° < I < 100° and |5| < 10°; Sale et al. 2014: 30° < I < 215° and \b\ < 5°). 
The Marshall et al. (2006) map generally leads to a shallower effective selection function, because it has on average higher extinction (see 
Figure 5). The agreement between the two most recent extinction maps (Sale et al. 2014 and Green et al. 2015) is remarkably good. 
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maps. The current distance sampling is about 0.5 mag, 
or about 25 % in distance, which causes an uncertainty 
in the effective volume of ^ 10“^ to 10“^, depending on 
how much of the inner MW is included. From an analysis 
of the small-scale structure in the distance integration, 
we expect this uncertainty to decrease by three orders of 
magnitude or five (excluding the inner MW) for an im¬ 
provement in the distance resolution by a factor of ten. 
Such an improvement appears plausible with the pre¬ 
cise Gaia distances for large numbers of stellar tracers of 
the extinction and allows sensitive analyses of the disk’s 
structure with the full Gaia data set. Analyzing the non- 
axisymmetric component of the stellar disk that is less 
smooth than the exponential disk models considered here 
would place somewhat more stringent conditions on the 
angular and distance resolution of extint ion maps; these 
can be evaluated with a similar formalism as introduced 
here. Extinction maps also need to attain a higher ac¬ 
curacy, because current extinction maps differ by much 
more than their stated uncertainties in regions of high 
extinction. This requires a larger contribution from IR 
data in the creation of 3D extinction maps. 

We have also discussed the formalism for phase- 
space inference of non-contiguous, spectroscopic sur¬ 
veys. An application of this formalism is given in 
Bovy et al. (2016), who analyze the density distribu¬ 
tion of abundance-selected populations using RC stars 
in APOGEE. We have demonstrated that the 3D extinc¬ 
tion map can be fully and efficiently taken into account in 
likelihood-based phase-space inferences using an effective 
selection function under the assumption that the density 
is close to constant within each spectroscopic pointing. 

Our formalism makes it clear that in both the contigu¬ 
ous and the non-contiguous case the small-scale structure 
in the 3D extinction map is filtered by the smooth density 
field. This is because the effective survey volume is essen¬ 
tially the cross-correlation of the smooth (phase-space) 
density and the effective selection function; as the den¬ 
sity is very smooth, this cross-correlation is effectively a 


low-pass filtering of the effective selection function, which 
removes the influence of the patchy extinction map. 

Progress in obtaining better 3D extinction maps in 
the future will require a balance between angular and 
distance resolution because of the finite number of stars 
available to use as tracers. Essentially, one can use larger 
angular bins to obtain a larger number of stars to in¬ 
fer the distance dependence of the extinction, although 
in practice a better approach is to require a high de¬ 
gree of correlation between adjacent, small angular bins 
or by using Gaussian processes (e.g.. Sale & Magorrian 
2014). Erom our analysis we can conclude that im¬ 
proving the distance resolution—especially in the high- 
extinction, inner Galaxy—is much more important for 
studies of Galactic structure than improving the angular 
resolution. This is especially important for future analy¬ 
ses that aim to take advantage of the full Gaia catalog to 
tease out small signals of, e.g., phase-space asymmetries. 
Such analyses require the best smooth background mod¬ 
els and the ability to compute the background model to 
an accuracy of ~ 10“^. 
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APPENDIX 

A THREE-DIMENSIONAL EXTINCTION MAP OVER THE FULL SKY 

In recent years a number of three-dimensional maps of the integrated extinction out to many kpc based on modeling 
stellar photometry have appeared (e.g., Marshall et al. 2006; Sale et al. 2014; Green et al. 2015). None of these cover 
the full sky. To illustrate the formalism for accounting for the effect of extinction on the volume selection of stellar 
surveys presented in this paper and to make projections for Gam, we require a full-sky, three-dimensional extinction 
map. 

We perform a simple combination of the extinction maps of Marshall et al. (2006) (based on 2MASS data) and 
Green et al. (2015) (based on Pan-STARRS and 2MASS data) and fill in the remaining parts of the sky with the map 
of Drimmel et al. (2003), a three component analytic model for the dust distribution fit to the GOBE DIRBE data and 
normalized to the map of Schlegel et al. (1998). Bovy et al. (2016) found by modeling the observed counts of RG stars 
near the midplane in APOGEE that the map of Marshall et al. (2006) performs better than that of Green et al. (2015) 
where they overlap, essentially because the latter underestimates the amount of extinction in regions of high extinction 
due to the paucity of main-sequence stars that they primarily use, while Marshall et al. (2006) mainly employ giant 
stars that are visible to larger distances. We do not use the map of Sale et al. (2014), as it led to inferior fits of the 
APOGEE stellar densities. 

Specifically, we follow the hierarchical HEALPix format of Green et al. (2015) and evaluate the Marshall et al. (2006) 
map, which exists over the range —100° < I < 100° and \b\ < 10°, on a grid of HEALPix pixel centers with A^side = 512 
or an approximate resolution of 7'. We employ this resolution to slightly over sample the native resolution of 15'. We 
then add the hierarchical map of Green et al. (2015), which has a variable angular resolution of ~ 4' to 14', outside 
of the Marshall et al. (2006) area. The remaining area of the sky, mainly near the south celestial pole, is filled in 
with the map of Drimmel et al. (2003) at a HEALPix resolution of A'side = 256, which again somewhat oversamples 
the resolution of the Drimmel et al. (2003) map (~ 20'). We also fill in parts of A^side = 256 pixels that are partially 
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covered by the other two maps. 

The resulting map of Aq at 5kpc is displayed in Figure 1. We use AcjE^B — V)= 2.35 (Bovy et al. 2014). While 
this map is not perfect and the effect of stitching together the three different extinction maps is clearly visible, this 
map does an adequate job of describing the extinction over the full sky and is good enough for our purposes here. 
The most problematic region is the region of the Galactic plane covered by neither the Marshall et al. (2006) or the 
Green et al. (2015) map (250° < I < 260°), but in the high-latitude regions filling in uncovered regions with the model 
of Drimmel et al. (2003) performs well. 

To aid in the comparison of different 3D extinction maps we have made a code publicly available at 

http://github.com/j obovy/mwdust 

This code downloads and installs all of the necessary data and code to evaluate the extinction maps of Schlegel et al. 
(1998), Drimmel et al. (2003), Marshall et al. (2006), Sale et al. (2014), and Green et al. (2015) using a common 
interface. 
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